1 research outputs found

    Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities

    Get PDF
    International audienceSocial media, particularly Twitter, is increasingly used to improve resilience during extreme weather events/emergency management situations, including floods: by communicating potential risks and their impacts, and informing agencies and responders. In this paper, we developed a prototype national-scale Twitter data mining pipeline for improved stakeholder situational awareness during flooding events across Great Britain, by retrieving relevant social geodata, grounded in environmental data sources (flood warnings and river levels). With potential users we identified and addressed three research questions to develop this application, whose components constitute a modular architecture for real-time dashboards. First, polling national flood warning and river level Web data sources to obtain at-risk locations. Secondly, real-time retrieval of geotagged tweets, proximate to at-risk areas. Thirdly, filtering flood-relevant tweets with natural language processing and machine learning libraries, using word embeddings of tweets. We demonstrated the national-scale social geodata pipeline using over 420,000 georeferenced tweets obtained between 20-29th June 2016. Highlights • Prototype real-time social geodata pipeline for flood events and demonstration dataset • National-scale flood warnings/river levels set 'at-risk areas' in Twitter API queries • Monitoring multiple locations (without keywords) retrieved current, geotagged tweets • Novel application of word embeddings in flooding context identified relevant tweets • Pipeline extracts tweets to visualise using open-source libraries (SciKit Learn/Gensim) Keywords Flood management; Twitter; volunteered geographic information; natural language processing; word embeddings; social geodata. Hardware required: Intel i3 or mid-performance PC with multicore processor and SSD main drive, 8Gb memory recommended. Software required: Python and library dependencies specified in Appendix A1.2.1, (viii) environment.yml Software availability: All source code can be found at GitHub public repositorie
    corecore